A Stepping Stone towards Visualization of Data

Hello Everyone! This happens to be my first ever blog post for which I am very excited. Writing and Setting up a blog was something that was in my mind for a while and finally after lots of failed attemps and sometimes excusable procastination attempts I managed to write one.

This blog post will be about a very important topics, which I believe shares a behemoth of impact of creating a difference between a good analysis from a great one.

“Numbers have an important story to tell. They rely on you to give them a clear and convincing voice.” - Stephen Few

The post will ctry to over these aspects - An attempt to understand the visualization process put forward by Ben Fry - Visualizing Airports locations around the world - Visual Study of Flight Routes in-around India.

Stages To Visual Information

My interest in generative art, graviated me towards Ben Fry initially to develop and uderstanding on how to present data in a more meaningful way. His book Visualizing Data provides a seen step process to create a narrative originated from data.

Let’s start covering each step one by one, and soon we will realise that these arnt much of an incremental steps as much as a interwined processes which we return back to one after another.

Locating Airports around the World.

Acquire

Most of the data visualization originates from a question. It is important to have a question as it seperates unnecessary constructs and provides a precise answer to the question.

Where are the Airports around the world located?

Lets acquire the data.

Before that I’ll load important packages.

library(XML)
library(ggplot2)
library(tidyr)
library(dplyr)
library(sp)
library(geosphere)
library('maps')
library('ggthemes')
library('plotly')

After a few minutes of google search I happen to find a document that has all the coordiates of Airports locations around the globe. Let’s load that.

A_loc<-tbl_df(readLines("https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat"))
head(A_loc)
## # A tibble: 6 x 1
##                                                                         value
##                                                                         <chr>
## 1 "1,\"Goroka Airport\",\"Goroka\",\"Papua New Guinea\",\"GKA\",\"AYGA\",-6.0
## 2 "2,\"Madang Airport\",\"Madang\",\"Papua New Guinea\",\"MAG\",\"AYMD\",-5.2
## 3 "3,\"Mount Hagen Kagamuga Airport\",\"Mount Hagen\",\"Papua New Guinea\",\"
## 4 "4,\"Nadzab Airport\",\"Nadzab\",\"Papua New Guinea\",\"LAE\",\"AYNZ\",-6.5
## 5 "5,\"Port Moresby Jacksons International Airport\",\"Port Moresby\",\"Papua
## 6 "6,\"Wewak International Airport\",\"Wewak\",\"Papua New Guinea\",\"WWK\",\

The data happens to be quite messy. But we managed to pass the first stage we have acquired the data. If we look carefull we can actually see country, co-ordinates of the locations of the airports.

Parse

The next step will be to provide a structure to the acquired data, and to place them is specific order that makes sense to us. An easy way to test that if our data has structure is to look at the parsed dataset and see if one can mentally “plot” something out of it.

I sepearted the enitre dataset into required coloumns as mentioned in the documentations.

New_A_loc<-as.data.frame(sapply(A_loc, function(x) gsub("\"", "", x)))
New_A_loc<-separate(data = New_A_loc, col = value, into = c("Airport_id", "Name","City","Country","IATA","ICAO","Lat","Long","Alt","Timezone","DST","TZ","Type","Source"), sep = ",")

New_A_loc$Lat <- as.numeric(New_A_loc$Lat)
New_A_loc$Long <- as.numeric(New_A_loc$Long)
New_A_loc$Alt<-as.numeric(New_A_loc$Alt)

head(New_A_loc)
##   Airport_id                                        Name         City
## 1          1                              Goroka Airport       Goroka
## 2          2                              Madang Airport       Madang
## 3          3                Mount Hagen Kagamuga Airport  Mount Hagen
## 4          4                              Nadzab Airport       Nadzab
## 5          5 Port Moresby Jacksons International Airport Port Moresby
## 6          6                 Wewak International Airport        Wewak
##            Country IATA ICAO       Lat    Long  Alt Timezone DST
## 1 Papua New Guinea  GKA AYGA -6.081690 145.392 5282       10   U
## 2 Papua New Guinea  MAG AYMD -5.207080 145.789   20       10   U
## 3 Papua New Guinea  HGU AYMH -5.826790 144.296 5388       10   U
## 4 Papua New Guinea  LAE AYNZ -6.569803 146.726  239       10   U
## 5 Papua New Guinea  POM AYPY -9.443380 147.220  146       10   U
## 6 Papua New Guinea  WWK AYWK -3.583830 143.669   19       10   U
##                     TZ    Type      Source
## 1 Pacific/Port_Moresby airport OurAirports
## 2 Pacific/Port_Moresby airport OurAirports
## 3 Pacific/Port_Moresby airport OurAirports
## 4 Pacific/Port_Moresby airport OurAirports
## 5 Pacific/Port_Moresby airport OurAirports
## 6 Pacific/Port_Moresby airport OurAirports

Looks good! We provided specific structure, each Airport has now an ID, its geo-location, Name, City etc.

As mentioned before though those beig the 7 steps, there is no rule which states that we specifically need to follow them in particular order or follow all of these. They provide a framework to work on number and provide them with a different persona.

As now we have a proper dataset, I will directly jump to representation of those on a map.

Represent

This is where we will have our first visualization. We will form a scatterplot of airport location of a world map. Let’s work around that.

I prefer ggplot, over base plotting system. Its more resourceful and funcationally accessible.

We have Latitudes and Longitudes of each airport location. GGplot has a map function that lets us pin-point co-ordinates precisily on a world map. We can further customize it to our setting too!

world <- ggplot() +
  borders("world", colour = "#3e3e40", fill = "#3e3e40") +
  theme(panel.background = element_rect(fill = "#252526", colour = "#252526"),panel.grid.minor = element_blank(),panel.grid.major = element_blank(),axis.title.y = element_blank(),axis.title.x  = element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank(),axis.text.y=element_blank(),axis.ticks.y=element_blank())
map <- world +
  geom_point(aes(x = Long, y = Lat,
                 text = paste('City: ', City,
                              '<br /> Name : ', Name),
                 ID = Airport_id),
             data = New_A_loc, colour = "#ebe6ed", alpha=1/4,size=0.4)+labs("Airport")

map

Looks good! So many Airports!!

Filter

Sometimes, its better to be precise with limited data, then to be inaacurate with a massive dataset. Filtering removes with is not useful or rather which dosnt have much impact on the overall aspect of visualization. I will filter Airports, that is located within India.

Interact

Moving on, The last step is usually tricky and sometimes is quite underrated. But interaction of a visualial graphic has a profound impact. Letting user given a control over those visual functionality. Its provides both depth and breath to the visual mechanics of a plot.

For that we will use Plotly package in R.

ggplotly(map, tooltip = c('text', 'ID'))

And finally we have our first visualization. The asthethics can be further improved I guess, I am bad with color, I can clearly see that.

Tracing Flight Routes

I found Aaron Koblin’s Flight Patterns to be an amazing masterpiece, so simple yet so informative. So this is my short attempt to replicate his work using just the routes and not the entire plane schedule( I was not able to access those).

Mine and Filter

There is an important concept called the great circles which gives us a route for airlines to follow from one point to another. Mine is a step which incorporates mathemathics and stastics to uncover more details about something.

Here I will only consider 2 main airports to observe route. Delhi and Mumbai

  dat_point<-subset(New_A_loc,IATA==c("DEL","BOM"),select=c(Long,Lat))
  gg4<-map+coord_cartesian(ylim = c(0,60),xlim=c(40,120))+geom_point(data=dat_point,aes(x=Long, y=Lat),color="#ebe6ed",alpha=1/5,size=2)
  

  l<-tbl_df(gcIntermediate(c(dat_point$Long[1],dat_point$Lat[1]),c(dat_point$Long[2],dat_point$Lat[2]),n=100,addStartEnd = TRUE,sp=FALSE))
  gg5<-gg4+geom_line(data=l,aes(x=lon,y=lat),color="white")
  gg5

Till now we have managed to cover 6 of 7 steps in Visualization of Data.

But again the proces is not unidirectional.

Acquire

routes<-tbl_df(readLines("https://raw.githubusercontent.com/jpatokal/openflights/master/data/routes.dat"))
routes<-separate(data = routes, col = value, into = c("Airline", "Airline_iD","Source_airport","Source_airport_id","Destination_airport","Destination_airport_id","Codeshare","Stops","Equipment"), sep = ",")

Refine

Refine aspect cover imporving the visual features to clarify represenation.

routes[ routes == "\\N" ] <- NA

Routes_source<-routes[,4]
names(Routes_source)<-"Airport_id"
Routes_destination<-routes[,6]
names(Routes_destination)<-"Airport_id"

Filter

Airport<-New_A_loc[,c(1,4,7,8)]

d1<-left_join(Routes_source,Airport,by="Airport_id")
d2<-left_join(Routes_destination,Airport,by="Airport_id")
Df<-cbind(d1,d2)
names(Df)<-c("Airport_id_in","Country_in","Lat_in","Long_in","Airport_id_out","Country_out","Lat_out","Long_out")
Df<-tbl_df(Df[complete.cases(Df),])

In<-subset(Df, Country_in =="India")
Out<-subset(Df,Country_out =="India")

## Incoming Flights

Mine

  l<-gcIntermediate(cbind(In$Long_in,In$Lat_in),cbind(In$Long_out,In$Lat_out),n=100,addStartEnd = TRUE,sp=TRUE)
  
  d_l<-SpatialLinesDataFrame(l,
                            data.frame(A_id_in = In$Airport_id_in,
                                       A_id_out = In$Airport_id_out,
                                       stringsAsFactors = FALSE))
  d_l_df <- fortify(d_l)
  gg6<-world+geom_path(data=d_l_df,aes(long, lat , group = group),alpha=0.05,color="white")
  
  gg6_t<-map+geom_path(data=d_l_df,aes(long, lat , group = group),alpha=0.05,color="white")
# Outgong Flights
  l<-gcIntermediate(cbind(Out$Long_in,Out$Lat_in),cbind(Out$Long_out,Out$Lat_out),n=100,addStartEnd = TRUE,sp=TRUE)
  
  d_l<-SpatialLinesDataFrame(l,
                             data.frame(A_id_in = Out$Airport_id_in,
                                              A_id_out = Out$Airport_id_out,
                                              stringsAsFactors = FALSE))
  
  d_l_df <- fortify(d_l)

Represent

 gg6<- gg6+geom_path(data=d_l_df,aes(long, lat , group = group),alpha=0.05,color="white")

gg6+coord_cartesian(ylim = c(0,60),xlim=c(40,120))

Interact

ggplotly(gg6_t, tooltip = c('text', 'ID'))

Not exactly what I had in mind. But it somewhat tries to imitate what Aaron Koblin attempted.

I hope you liked my first attempt at writing blog post, and most of all my attempt in trying to explain the basic framework for visualization. Let me know what you think about this.